SiteQ: Engineering High Performance QA System Using Lexico-Semantic Pattern Matching and Shallow NLP
نویسندگان
چکیده
In TREC-10, we participated in the web track (only ad-hoc task) and the QA track (only main task). In the QA track, our QA system (SiteQ) has general architecture with three processing steps: question processing, passage selection and answer processing. The key technique is LSP’s (Lexico-Semantic Patterns) that are composed of linguistic entries and semantic types. LSP grammars constructed from various resources are used for answer type determination and answer matching. We also adapt AAD (Abbreviation-Appositive-Definition) processing for the queries that answer type cannot be determined or expected, encyclopedia search for increasing the matching coverage between query terms and passages, and pivot detection for the distance calculation with answer candidates. We used two-level answer types consisted of 18 upper-level types and 47 lower-level types. Semantic category dictionary, WordNet, POS combined with lexicography and a stemmer were all applied to construct the LSP knowledge base. CSMT (Category Sense-code Mapping Table) tried to find answer types using the matching between semantic categories and sense-codes from WordNet. Evaluation shows that MRR for 492 questions is 0.320 (strict), which is considerably higher than the average MRR of other 67 runs. In the Web track, we focused on the effectiveness of both noun phrase extraction and our new PRF (Pseudo Relevance Feedback). We confirmed that our query expansion using PRF with TSV function adapting TF factor contributed to better performance, but noun phrases did not contribute much. It needs more observations for us to make elaborate rules of tag patterns for the construction of better noun phrases.
منابع مشابه
A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملSiteQ/J: A Question Answering System for Japanese
This paper describes our Question Answering system participated in QAC Task1 of NTCIR3 and reports the results with some observations. Through analyzing the previous TREC QA data, we defined passage and developed passage selection method suitable for Question Answering. Using LexicoSemantic Patterns (LSP), we identify answer type of a question and detect answer candidates without any deep lingu...
متن کاملAplicación de técnicas basadas en PLN al tratamiento de preguntas médicas en Búsqueda de Respuestas
Nowadays, there is an increasing interest in research on QA over restricted domains. Concretely, in this paper we will show the process of question analysis in our medical QA system. In this system we combine the use of NLP techniques and the UMLS Metathesaurus as knowledge source. The main NLP technique is the use of logic forms and the pattern matching technique in this question analysis perf...
متن کاملAutomatic Acquisition of Lexico-semantic Knowledge for QA
We present an experiment for finding semantically similar words on the basis of a parsed corpus of Dutch text and show that the acquired information correlates with relations found in Dutch EuroWordNet. Next, we demonstrate how the acquired knowledge can be used to boost the performance of an open-domain question answering system for Dutch. Automatically acquired lexico-semantic information is ...
متن کاملSemantic Pattern for User-Interactive Question Answering
A new semantic pattern is proposed in this paper, which can be used by users to post questions and answers in user-interactive question answering (QA) system. The necessary procedures of using semantic pattern in a QA system are also presented, which include question structure analysis, pattern matching, pattern generation, pattern classification and answer extraction. A user interface of using...
متن کامل